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Abstract. Acquisition-to- acquisition signal intensity variations (non-standardness) are in- 
herent in MR images. Standardization is a post processing method for correcting inter- 
subject intensity variations through transforming all images from the given image gray scale 
into a standard gray scale wherein similar intensities achieve similar tissue meanings. The 
lack of a standard image intensity scale in MRI leads to many difficulties in tissue charac- 
terizability, image display, and analysis, including image segmentation. This phenomenon 
has been documented well; however, effects of standardization on medical image registration 
have not been studied yet. In this paper, we investigate the influence of intensity standard- 
ization in registration tasks with systematic and analytic evaluations involving clinical MR 
images. We conducted nearly 20,000 clinical MR image registration experiments and evalu- 
ated the quality of registrations both quantitatively and qualitatively. The evaluations show 
that intensity variations between images degrades the accuracy of registration performance. 
The results imply that the accuracy of image registration not only depends on spatial and 
geometric similarity but also on the similarity of the intensity values for the same tissues in 
different images. 
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1. Introduction 

Image registration is an essential operation in a variety of medical imaging applications 
including disease diagnosis, longitudinal studies, data fusion, image segmentation, image 
guided therapy, volume reconstruction, pathology detection, and shape measurement ([10, 23, 
]). It is the process of finding a geometric transformation between a pair of scenes, the source 
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scene and the target scene, such that the similarity between the transformed source scene 
[registered source) and target scene becomes optimum. There are many challenges in the 
registration of medical images. Among these, those that stem from the artifacts associated 
with images include the presence of noise, interpolation artifacts, intensity non-uniformities, 
and intensity non-standardness. Although considerable research has gone into addressing the 
effects of noise ([7]), interpolation ([21, 8, 11]), and non-uniformity in image registration ([9]), 
little attention has been paid to study the effects of image intensity standardization/non- 
standardness in image registration. This aspect constitutes the primary focus of this paper. 

MR image intensities do not possess a tissue specific numeric meaning even in images 
acquired for the same subject, on the same scanner, for the same body region, by using 
the same pulse sequence ([17, 4]). Not only a registration algorithm needs to capture both 
large and small scale image deformations, but it also has to deal with global and local 
image intensity variations. The lack of a standard and quantifiable interpretation of image 
intensities may cause the geometric relationship between homologous points in MR images 
to be affected considerably. Current techniques to overcome these differences/ variations fall 
into two categories. The first class of methods uses intensity modelling and/or attempts to 
capture intensity differences during the registration process. The second group constitutes 
post processing methods that are independent of registration algorithms. Notable studies 
that have attempted to solve this problem within the first class are ([6, 20, 1]). While 
global intensity differences are modelled with a linear multiplicative term in ([ ]), local 
intensity differences are modelled with basis functions. In ([21 ]), a locally affine but globally 
smooth transformation model has been developed in the presence of intensity variations 
which captures intensity variations with explicitly defined parameters. In ([6]), intensities 
of one image are mapped into those of another via an adaptive transformation function. 
Although incorporating intensity modelling into the registration algorithms improves the 
accuracy, simultaneous estimation of intensity and geometric changes can be quite difficult 
and computationally expensive. 

The papers that belong to the second group of methods are ([17, 4, 19, 12, 14]) in which 
a two-step method is devised for standardizing the intensity scale in such a way that for the 
same MRI protocol and body region, similar intensities achieve similar tissue meaning. The 
methods transform image intensities non-linearly so that the variation of the overall mean 
intensity of the MR images within the same tissue region across different studies obtained on 
the same or different scanners is minimized significantly. Furthermore, the computational 
cost of these methods is considerably small in comparison to methods belonging to the 
first class. Once tissue specific meanings are obtained, quantification and image analysis 
techniques, including registration, segmentation, and filtering, become more accurate. 

The non-standardness issue was first demonstrated in ([17]) where a method was proposed 
to overcome this problem. The new variants of this method are studied in ([ ]). Numerical 
tissue characterizability of different tissues is achieved by standardization and it is shown 
that this can significantly facilitate image segmentation and analysis in ([4, 5]). Combined 
effects of non-uniformity correction and standardization are studied in ([12]) and the se- 
quence of operations to produce the best overall image quality is studied via an interplaying 
sequence of non-uniformity correction and standardization methods. In ([ ]), an improved 
standardization method based on the concept of generalized scale is presented. In ([ ]), the 
performance of standardization methods is compared with the known tissue characterizing 
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property of magnetization transfer ratio (MTR) imaging and it is demonstrated that tissue 
specific intensities may help characterizing diseases. 

The motivation for the research reported in this paper is the preliminary indication in ([2]) 
of the potential impact that intensity standardization may have on registration accuracy. 
Currently no published study exists that has examined how intensity non-standardness alone 
may affect registration. The goal of this paper is, therefore, to study the effect of non- 
standardness on registration in isolation. Toward this goal, first intensity non-uniformities 
are corrected in a set of images, and subsequently, they are standardized to yield a "clean 
set" of images. Different levels of non-standardness are then introduced artificially into 
these images which are then subjected to known levels of affine deformations. The clean 
set is also subjected to the same deformations. The deformed images with and without 
non-standardness are separately registered to clean images and the differences in their regis- 
tration accuracy are quantified to express the influence of non-standardness. The underlying 
methods are described in Section II and the analysis results are presented in Section III. 
Section IV presents some concluding remarks. 

2. Methods 

2.1. Notations and Overview. We represent a 3D image, called scene for short, by a pair 
C — (C, f) where C is a finite 3D array of voxels, called scene domain of C, covering a body 
region of the particular patient for whom image data C are acquired, and f is an intensity 
function defined on C, which assigns an integer intensity value to each voxel y G C. We 
assume that f (v) > for all y G C and f(y) — if and only if there are no measured data 
for voxel v. 

In dealing with standardization issues, the body region and imaging protocol need to 
be specified. All images that are analyzed for their dependence on non-standardness for 
registration accuracy are assumed to come from the same body region B and acquired as 
per the same acquisition protocol P. The non-standardness phenomenon is predominant 
mainly in MR imaging. Hence, all image data sets considered in this paper pertain to MRL 
However, the methods described here are applicable to any modality where this phenomenon 
occurs (such as radiography and electron microscopy). 

There are six main components to the methods presented in this paper: (1) intensity 
non-uniformity correction, referred to simply as correction and denoted by an operator k; 
(2) intensity standardization denoted by an operator \J); (3) an affine transformation of the 
scene, denoted by T used for the purpose of creating mis-registered scenes; (4) introduction 
of artificial intensity non-standardness denoted by the operator i|>; (5) an affine scene trans- 
formation that is intended to register a scene with its mis-registered version; (6) evaluation 
methods used to quantify the goodness of scene registration. 

Super scripts c, s, s, t and r are used to denote, respectively, the scenes resulting from ap- 
plying correction, standardization, introduction of non-standardness, mis-registration, and 
registration operations to a given scene. Examples: C c — k(C);C cs = kiJj(C);C css = 
i|> (C cs ) ;C csst = T(C CSS ). When a registration operation T r is applied to a scene C, the 
target scene to which C is registered will be evident from the context. The same notations 
are extended to sets of scenes. For example, if x is a given set of scenes for body region B 
and protocol P, then x css = ^ (x cs L where x cs = Kip (x) 

Our approach to study the effect of non-standardness on registration is as follows: 
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(51) Take a set x of scenes, pertaining to a fixed B and P, but acquired from different subjects 
in routine clinical settings. 

(52) Apply correction followed by standardization to the scenes in x to produce the set 
X cs of clean scenes. x cs is as f ree from non-uniformities, and more importantly, from non- 
standardness, as we can practically make. As justified in ([ ]), the best order and sequence 
of these operations to employ in terms of reducing non-uniformities and non-standardness 
is k followed by ijx This is mainly because any correction operation introduces its own 
non-standardness. 

(53) Apply different known levels of non-standardness to the scenes in x cs to produce the 
set x css - 

(54) Apply different known levels of affine deformations T to the scenes in x css to form the 
scene set x csst . Apply the same deformations to the clean scenes in the set x cs to create 
X cst . In this manner for any scene C cs G X cs > we have the same scene after applying some 
non-standardness and the same deformation T, namely C csst . 

(55) Register each scene C cs G x cs to C cst G x cst an d determine the required affine deformation 
T s (the subscript s indicates "standardized" ) . Similarly register each C cs G x cs to C csst G x csst 
and determine affine deformation T ns (ns for "not standardized") needed. 

(56) Analyze the deviations of T s and T ns from the true applied transformation T over all 
scenes and as a function of the applied level of non-standardness and affine deformations. 

In the rest of this section, steps S1-S6 are described in detail. 
SI: Data Sets 

Two separate sets of image data (i.e., two sets x) are use d in this study, both brain MR 
images of patients with Multiple Sclerosis, one of them being a T2 weighted acquisition, and 
the other, a proton density (PD) weighted set, with the following acquisition parameters: 
Fast Spin Echo sequence, 1.5T GE Signa scanner, TR=2500 msec, voxel size 0.86x0.86x3 
mm 3 . Each of the two sets is composed of 10 scenes. Since the two data sets for each patient 
are acquired in the same session with the same repetition time but by capturing different 
echos (TE = 18msec, 96msec), the T2 and PD scenes for each patient can be assumed to be 
in registration. 

S2. Non-uniformity Correction, Standardization 

For non-uniformity correction, we use the method based on the concept of local morpho- 
metric scale called g-scale ([15]). Built on fuzzy connectedness principles, the g-scale at a 
voxel y in a scene C is the largest set of voxels fuzzily connected to y in the scene such 
that all voxels in this set satisfy a predefined homogeneity criterion. Since the g-scale set 
represents a partitioning of the scene domain C into fuzzy connectedness regions by using 
a predefined homogeneity criterion, resultant g-scale regions are locally homogeneous, and 
spatial contiguity of this local homogeneity is satisfied within the g-scale region. 

g-scale based non-uniformity correction is performed in a few steps as follows. First, g- 
scale for all foreground voxels is computed. Second, by choosing the largest g-scale region, 
background variation is estimated. Third, a correction is applied to the entire scene by 
fitting a second order polynomial to the estimated background variation. These three steps 
are repeated iteratively until the largest g-scale region found is not significantly larger than 
the previous iteration's largest g-scale region. 

Standardization is a pre-processing technique which maps non-linearly image intensity gray 
scale into a standard intensity gray scale through a training and a transformation step. In the 



training step, a set of images acquired for the same body region B as per the same protocol 
P are given as input to learn histogram-specific parameters. In the transformation step, any 
given image for B and P is standardized with the estimated histogram-specific landmarks 
obtained from the training step. In the data sets considered for this study, B = Head and 
P represents two different protocols, namely T2 and PD. The training and transformation 
steps are done separately for the two protocols. 

The basic premise of standardization methods is that, in scenes acquired for a given (B, P), 
certain tissue-specific landmarks can be identified on the histogram of the scenes. Therefore, 
by matching the landmarks, one can standardize the gray scales. Median, mode, quartiles, 
and deciles, and intensity values representing the mean intensity in each of the largest few 
g-scale regions have been used as landmarks. Additionally, to handle outliers, a "low" 
and "high" intensity value (selected typically at and 99.8 percentiles) are also selected as 
landmarks. 

In the training step, the landmarks are identified for each training scene specified for (B, P) 
and intensities corresponding to the landmarks are mapped into an assumed standard scale. 
The mean values for these mapped landmark locations are computed. In the transformation 
step, the histogram of each given scene C to be standardized is computed, and intensities 
corresponding to the landmarks are determined. Sections of the intensity scale of C are 
mapped to the corresponding sections of the standard scale linearly so that corresponding 
landmarks of scene C match the mean landmarks determined in the training step. (The 
length of the standard scale is chosen in such a manner that the overall mapping is always 
one-to-one and no two intensities in C map into a single intensity on the standard scale.) 
Note that the overall mapping is generally not a simple linear scaling process but, indeed, 
a non-linear (piece- wise linear) operation; see ([17, 19]) for details. In the present study, 
standardization is done separately for T2 and PD scenes. 
S3. Applying Non-standardness 

To artificially introduce non-standardness into a clean scene C cs — (C,f cs ), we use the idea 
of the inverse of the standardization mapping described in ([12]). A typical standardization 
mapping is shown in Figure 1. In this figure, only three landmarks are considered - "low" and 
"high" intensities (pi and p2) and the median (\x) corresponding to the foreground object. 
There are two linear mappings: the first from [pi, \x\ to [si, [i s ] and the second from [|X,P2] 
to [|x S) S2L [si, Si\ denotes the standard scale. The horizontal axis denotes the non-standard 
input scene intensity and vertical axis indicates the standardized output scene intensity. In 
inverse mapping, since C cs has already been standardized, the vertical axis can be considered 
as the input scene intensity, f cs (v), and the horizontal axis can be considered as the output 
scene intensity, f css (v), where mapping the clean scene through varying the slopes mi and 
IU2 results in non-standard scenes. By using the values of mi and IU2 within the range of 
variation observed in initial standardization mappings of corrected scenes, the non-standard 
scene intensities can be obtained by 




(i) 



where [.] converts any number y£ d\ to the closest integer Y, and |Li s denotes the median 
intensity on the standard scale. 
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Figure 1. The standardization transformation function for inverse mapping 
with the various parameters shown. 

In order to keep the number of registration experiments manageable, this simple model 
was used which involves only two variables mi and IU2. Even so, as described later, this 
study entails nearly 20,000 registration experiments. 
S4- Applying Affine Deformations 

All components of the affine transformation - rotations about all three axes and translation, 
scaling, and shear in all three directions - are taken into account in creating scene sets x cst 
and ^ csst . 

55. Scene Registration 

The algorithm that determines the affine transformation matrix by minimizing the sum of 
squared scene intensity differences as described in ([ ]) is used in this step. A separate 
transformation matrix is found for registering each C cs to C cst , resulting in T s , and C cs to 
C csst , resulting in T ns . 

56. Evaluation 

Two types of tests were carried out - accuracy and consistency. The goal of the accuracy test 
was to determine how close the recovered registration transformations are to the known true 
transformations. The aim of the consistency test was to check how the observed accuracy 
behavior would consistently occur when different accuracy tests are conducted. In each test, 
two transformations T s and T ns are compared by using the methodology that is described 
in ([18]) and summarized below. Let Cji G Xt2 and Cpd G Xpd be the T2 and PD scenes of any 
particular patient. Let T S)X and T nS)X , x G {T2, PD}, be the transformations obtained in Step 
S5 by matching C£ s to C£ st and to C x sst , respectively. In the accuracy tests, T S)X and T nS)X 
are compared with the true (known) transformation T over all levels of non-standardness 
and deformations that were employed. Both T S)X and T nS)X are expected to be the same as T. 
In the consistency tests, T S) j2 with T S) pd and T nS) j2 with T nS) po are compared over all levels 
of non-standardness and deformations that were applied, and they are expected to be equal 
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because PD and T2 scenes of the same patient are already in registration as described in 
SI. We measure the error between two transformations (in both the above scenarios) by the 
root-mean-squared error (RMSE) for the eight corner voxels of the box that approximately 
bounds the head, i.e., the volume of interest in our application. 

In the accuracy test, for a given level of applied non-standardness and affine deformation, 
we get 20 pairs of RMSE values, each pair indicating how close T S)X and T nS)X are to the 
true transformation T. A paired t-test is conducted to compare the accuracy of the two 
transformations based on RMSE values. The outcome of this test will be that either of 
the two transformations is more accurate than the other or there is no significant difference 
between the two (throughout we use P < 0.05 to indicate statistical significance). The set 
of all levels of applied deformations is divided into three groups - small, medium, large. 
For each of 8 levels of applied non-standardness and under each of these three groups, the 
number of occurrences of wins (w) , losses (1) and non-significant differences (n) are counted 
for T ns over T s . The number of wins and losses is normalized to get values in the range [0,1]: 
W x = wH ^ +n ; L x = w+ j +n . A particular configuration of wins and losses can be identified by 
a point with coordinates (W x , L x ) in a win-loss triangle as in Figure 2. Large values of L d 
and small values of Wd indicate that the performance point is closer to the point (1,0) of 
all-wins. The following metric is used to express the "goodness" value of the configuration. 



The procedure in the consistency test is similar to the above except that we have 10 pairs 
of RMSE values to compare and these values are obtained not by using the known true 
transformation but by using Ts,pd for T$j2 and Tn$,pd f° r Tns,t2- 




(2) 



(0,1) 




(10) 



Figure 2. Mapping procedure for "goodness" value in normalized win-loss 
W x — L x plane. 
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3. Experimental Results 



3.1. Implementation Details. 

3.1.1. Correction and Standardization. These operations are carried out by using the 3DVIEWNIX 
software ([22]). Based on the experiments in ([17, 4, 19]), minimum and maximum percentile 
values are set to pci = and pc2 = 99.8, respectively. In the standard scale, Si and S2 are 
set to Si = 1 and S2 — 4095. Figure 3 shows the original, corrected, and standardized (after 
correction) of two PD and two T2- weighted slices taken from two different studies in the first, 
second and third rows, respectively. The gain in the similarity of resulting image intensities 
for similar tissue types obtained can be readily seen. 




Figure 3. Two slices from PD (first and second columns) and two slices 
from T2- weighted scenes (third and fourth columns) selected from two different 
studies before correction and standardization are displayed at default windows 
in the first row. Corresponding slices of the g-scale corrected scenes are shown 
in the second row. Clean scenes obtained after the standardization process of 
the corresponding corrected scenes are displayed at a standard window in the 
third row. 



3.1.2. Adding known levels of non-standardness. We combine eight different ranges of the 
slopes mi and mi to introduce small, medium, and large scale non-standardness into the 
scenes. This means that, for each clean scene, we obtain eight scenes, one of which is 
the default clean scene itself, two scenes consisting of small scale non-standardness, two 
scenes consisting of medium scale non-standardness, and three scenes consisting of large 

8 



scale non-standardness. The ranges of applied non-standardness are given in Table 1. We 
have arrived at these values by examining the training part of the standardization process 
through computing the ranges of the slopes mi and ITL2 that are utilized in standardizing 
the corrected scenes. Figures 4 and 5 illustrate the process of introducing known levels of 
non-standardness into the clean slices of a PD and a T2- weighted scene utilized in our study. 
In both figures, the first display shows the original clean slice and the rest show the resulting 
non-standard slices. 

Table 1 . Description of the different range of the slopes mi , ITL2 for introduc- 
ing artificial non-standardness 



function 


Range 


Description 




{0.9 < m,,m 2 < 1.5} 
{0.6 < m,,m 2 < 0.9} 


Small Scale 


5» 
^4 


{1.5 < m,,m 2 < 2.0} 
{2.0 <m,,m 2 < 2.4} 


Medium Scale 


^5 
^6 
^7 


{2.4 < m b m 2 < 2.7} 
{2.7 <mi,m 2 < 3,0} 
{3.0 < mi, m.2 < 3.3} 


Large Scale 




Figure 4. First image in the first row is a slice of a clinical PD weighted clean 
scene of the brain. The other slices are obtained by adding the 7 different levels 
of non-standardness into the clean scene (7 different levels of non-standardness 
are i|>i to 1^7). All images are displayed at the fixed gray level window chosen 
for the clean scene. 



3.1.3. Applying known amounts of deformation. Three different rotations (0, medium, and 
large angle), three translations (0, medium, and large displacement), three levels of scaling 
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Figure 5. First image in the first row is a slice of a clinical T2 weighted clean 
scene of the brain. The other slices are obtained by adding the 7 different 
levels of non-standardness into the clean scene. (7 different levels of non- 
standardness are i|>i to 1^7). All images are displayed at the fixed gray level 
window chosen for the clean scene. 

(0, medium, and large), and three levels of shearing (0, medium, and large) are combined 
to introduce 81 different known levels of deformation such that for all non-zero transfor- 
mations, all three directions/axes are involved. Table 2 summarizes the amount of these 
transformations used for each axis in the three different groups. 

Table 2. The amount of deformations corresponding to different groups of 
transformations . 



Transformation Type 


Zero 


Medium 


Large 


Translation 


pixels 


5 pixels 


20 pixels 


Rotation 


0° 


2° 


6° 


Scaling 


1 


1.05 


1.15 


Shearing 





0.01 


0.05 



3.1.4. Registration. We use the scene based affine registration method made available in 
the SPM software ([ ]). The algorithm determines the affine transformation matrix that 
optimally registers the two scenes by optimizing the sum of squared differences between 
scene intensities. 

3.2. Results. For each of the 20 clean scenes in Xji UXprn considering the 7 different levels 
of non-standardness together with one level of standardness (i.e. total of 8 levels) and 3 
x3x3x3 = 81 different levels of mis-registration, there will be 8 x 81 = 648 scenes. 
Thus, in the accuracy test, there will be 20 x 648 = 12,960 registration experiments. In the 
consistency test, similarly, there will be 6,480 additional registration experiments. These 
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additional experiments can be considered a validation for accuracy tests because they show 
how consistent the accuracy of the registration experiments are by using the fact that T2 
and PD scenes are in registration. 

The results of the comparison experiments are reported in Tables 3 and 4 for accuracy 
and consistency tests, respectively, for 7 sets of non-standard scenes with respect to the 
registration performance of clean scenes. The tables summarize the effectiveness of the 
registrations for each type of deformation recovered. The goodness values indicate that the 
ability to recover known deformations from transformed scenes is lower if intensity variations 
between source and target scenes are large. The goodness value y < 1 for scenes with non- 
standardness indicates that the registration between clean scenes outperforms registration 
between scenes with certain levels of non-standardness and this is true only if W x < L x . 

Table 3. Comparison of methods for Accuracy. The Goodness values y are 
listed. Type of non-standardness are indicated by aj>i, ...,i|>7, and the type of 
affine deformations are indicated by small, medium, and large in the columns. 



Type of Non-Standardness 


Small 


Medium 


Large 


Total 


*i 


1 


0.8222 


0.6562 


0.7811 




0.9400 


0.8305 


0.6167 


0.7716 


^3 


0.9369 


0.7751 


0.6309 


0.7651 


^4 


0.9318 


0.7004 


0.6048 


0.7565 


^5 


0.8806 


0.6004 


0.5622 


0.6254 


^6 


0.7565 


0.5511 


0.5341 


0.5881 


^7 


0.7447 


0.5901 


0.5051 


0.5819 



Table 4. Comparison of methods for Consistency. The Goodness values y are 
listed. Type of non-standardness are indicated by ipi, ...,1^7, and the type of 
affine deformations are indicated by small, medium, and large in the columns. 



Type of Non-Standardness 


Small 


Medium 


Large 


Total 




1.3427 


0.9289 


0.7423 


1 


^2 


1.1491 


1 


0.8039 


0.9530 


^3 


1 


0.8039 


0.7423 


0.8417 


^4 


0.8622 


0.7447 


0.5434 


0.7062 


^5 


0.9289 


0.6722 


0.5023 


0.6934 




0.8636 


0.6890 


0.5023 


0.6722 


^7 


1.0752 


0.7097 


0.2998 


0.6468 



A strong possible reason for the better performance of clean scenes with respect to the 
non-standard scenes is that structural information for the same subject in different non- 
standardness levels is not the same. Therefore, correlations of the intensity values for each 
structure in the scenes may not reach the optimum to which the registration algorithm 
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converges. Registration parameters are obtained through maximizing the similarity of two 
scenes, and it is well demonstrated in Tables 3 and 4 that correlation of the intensities is 
maximum when each structure in the scene has fixed tissue specific meaning. 

Another possible reason is that the relationship between voxel intensities may be non- 
linear. Since the introduction of non-standardness is itself a non-linear process, the similarity 
function is likely to be affected by this situation in the form of local fluctuations which 
may even lead to not only less accurate registration results but also to the situation of the 
optimization process getting locked at local optima. The opposite situation may happen 
as well, especially for large scale deformations; the registration algorithm may easily fail 
regardless of the standardization level of the scenes (see Figure 6 (a) for a failing example 
of clean scenes). Local fluctuations in the similarity measure due to non-standardness may 
lead to different optimum points depending on the degree of non-standardness, some of 
which may even improve registration, as shown in Figure 6 (b). Although the registration 
algorithm did not get stuck in the latter case, the accuracy of the registration quality was 
not high especially in terms of translation parameters. In order to cope with possible failing 
examples in registration, we ran the registration algorithm with proper initial estimation of 
the transformation matrix rather than the default identity transformation matrix used in all 
experiments. Figure 6 (c) shows the registered and transformed clean scenes overlaid where 
fuzziness in gray scale images demonstrates the misalignment. Compared to the registration 
of non-standard scene in Figure 6 (a), the performance of the registration of clean scene in 
this example is still better when the initial estimation of transformation matrix is given as 
input to the registration process. 




(a) (b) (c) 

Figure 6. (a) An example of registration failure for clean scene with a large 
deformation, (b) An example for registration success for non-standard scene 
for the same amount of deformation as in the example in (a), (c) If a proper 
initialization matrix is given as input to the registration algorithm, clean scene 
registration performance becomes better than the example for non-standard 
scene. 

Based on the fact that similarity of a pair of registered clean scenes is higher than the 
similarity of a pair of registered non-standard scenes, it can be deduced that substantially 
improved uniformity of tissue meaning between two scenes of the same subject being regis- 
tered improves registration accuracy. Our experimental results demonstrate that scenes are 
registered better whenever the same tissues are represented by the same intensity levels. 

12 



We note that, in both tables, most of the entries are less than 1. This indicates that in 
both accuracy and consistency tests, the standardized scene registration task wins over the 
registration of non-standard scenes. Table 3 on its own does not convey any information 
about what the actual accuracies in the winning cases are, or about whether the win happens 
for T2 scenes only, PD only, or for both. The fact that a majority of the corresponding cells 
in these tables both indicate wins suggests that accuracy-based wins happen for both T2 and 
PD scenes. Conversely, a favorable y value in Table 4 does not convey any information about 
whether the high consistency indicated also signals accuracy. Thus, accuracy and consistency 
are to some extent independent factors, and they together give us a more complete picture 
of the influence of non-standardness on registration. 

4. Concluding Remarks 

We described a controlled environment for determining the effects of intensity standardiza- 
tion on registration tasks in which the best image quality {clean scene) was established by the 
sequence of correction operation followed by standardization. We introduced several differ- 
ent levels of non-standardness into the clean scenes and performed nearly 20,000 registration 
experiments for small, medium and large scale deformations. We compared the registration 
performance of clean scenes with the performance of scenes with non-standardness and sum- 
marized the resulting goodness values. From overall accuracy and consistency test results 
in Tables 3 and 4, we conclude that intensity variation between scenes degrades registration 
performance. Having tissue specific numeric meaning in intensities maximizes the similarity 
of images which is the essence of the optimization procedure in registration. Standardization 
is therefore strongly recommended in the registration of images of patients in any longitudi- 
nal and follow up study, especially when image data come from different sites and different 
scanners of the same or different brands. 

In this paper, we introductorily addressed the problem of the potential influence of inten- 
sity non-standardness on registration. This is indeed a small segment of the much larger full 
problem: Unlike the specific intra-modality (or intra protocol) registration task considered 
here, there are many situations in which the source and the target images may be from 
different modalities or protocols (e.g., CT to MRI, PET to MRI, and Tl to T2 registration 
etc.), and each such situation may have its own theme of non-standardness. Further, these 
themes may depend on the body region, the scanner, and its brand. We determined that a 
full consideration of these aspects was just beyond the scope of this paper. Since the sum 
of squared differences is one of the most appropriate similarity metrics for intra-modality 
registration, we focused on this metric in our study. But, clearly, more studies of this type 
in the more general settings mentioned above are needed. 

Thus far, we controlled the computational environment via two factors: standardization 
and correction. A third important factor, noise, can be also embedded into the framework. 
It is known that correction itself introduces non-standardness into the scenes and it also 
enhances noise. Investigating the interrelationship between correction and noise suppres- 
sion algorithms and determining the proper order for these operations has been studied 
recently ([16]). A question immediately arises as to how standardization affects registration 
accuracy for different orders of correction and noise filtering. Based on the study in ([16]), 
we may conclude that non-uniformity correction should precede noise suppression and that 
standardization should be the last operation among the three to obtain best image quality. 
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However, it remains unclear as to how a combination of deterministic methods (standardiza- 
tion and correction) affects a random phenomenon like noise. It is thus important to study 
these three phenomena in the future on their own or in relation to how they may influence 
the registration process, especially in multi-center studies wherein data come from different 
scanners and brands of scanners. 
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